Improved mental health policies improves productivity
Author
C21380786 Matthew Bradon
Published
March 25, 2025
Introduction
Mental health has become an increasingly important topic in modern workplaces. Employees in tech companies often face tight deadlines that can exacerbate mental health challenges. This project explores the relationship between workplace benefits and mental health outcomes for employees in the tech sector.
The original goal of this analysis is to determine whether supportive workplace policies such as remote work, access to mental health care options, wellness programs, and flexible leave policies correlate with improved mental health outcomes. Specifically, examing whether these benefits increase the likelihood of employees seeking treatment and reduce self-reported work interference due to mental health conditions. However what was found was negiligble proof and even contradictory evidence that mental health policies had an effect on mental health related work interference.
The dataset used for is the analysis a 2014 Mental Health in Tech Survey found on kaggle (https://www.kaggle.com/datasets/osmi/mental-health-in-tech-survey). Transformations on gender and age were done, as well as an age group was added for better analysis.
Some of the following columns are:
family_history: Do you have a family history of mental illness?
treatment: Have you sought treatment for a mental health condition?
work_interfere: If you have a mental health condition, do you feel that it interferes with your work?
no_employees: How many employees does your company or organization have?
remote_work: Do you work remotely (outside of an office) at least 50% of the time?
tech_company: Does your employer provide mental health benefits?
care_options: Do you know the options for mental health care your employer provides?
wellness_program: Has your employer ever discussed mental health as part of an employee wellness program?
seek_help Does your employer provide resources to learn more about mental health issues and how to seek help?
anonymity: Is your anonymity protected if you choose to take advantage of mental health or substance abuse treatment resources?
leave: How easy is it for you to take medical leave for a mental health condition?
mental_health_consequence: Do you think that discussing a mental health issue with your employer would have negative consequences?
phys_health_consequence: Do you think that discussing a physical health issue with your employer would have negative consequences?
coworkers: Would you be willing to discuss a mental health issue with your coworkers?
supervisor: Would you be willing to discuss a mental health issue with your direct supervisor(s)?
mental_health_interview: Would you bring up a mental health issue with a potential employer in an interview?
phys_health_interview: Would you bring up a physical health issue with a potential employer in an interview?
mental_vs_physical: Do you feel that your employer takes mental health as seriously as physical health?
obs_consequence: Have you heard of or observed negative consequences for coworkers with mental health conditions in your workplace?
Data Preparation and Cleaning
Code
# Load and clean datasetmental_health <-read_csv("survey.csv") %>% janitor::clean_names()
Rows: 1259 Columns: 27
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (25): Gender, Country, state, self_employed, family_history, treatment,...
dbl (1): Age
dttm (1): Timestamp
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Gender has various different unique entries ranging from different ways of representing the two main sexes (M, m, Male, male) to Queer identities. There were some mispellings such as msle and mail. Some of the queer identities include Enby, Genderqueer, Agender, Androgyne, Trans-female. There are different spellings on the same identity i.e trans-female, Trans woman, and Female (trans). Some people also wrote Cis male or female cis. To be able to group programatically the different queer identities would be difficult and not scalable thus they will be grouped into other. Any correct spelling of male or female along with F or M will be grouped as male and female. This kind of problem could be avoided if the instead of taking a string input in the survey you used bulletpoints and if necessary an other text input.
I filtered for rows in the age range of 10 to 100. There were outliers such as 999999, -1, 5 and 8 which are invalid. The age_group column was added to explore how different age groups feel. The age groups are grouped in increments of 5 starting from 10 to 60+.
There is a states column which is states for the USA. However there are non US countires where there is no states value for them.
Data Overview
Code
datatable(mental_health)
Data Exploration
Below is the unique values for each column and the count.
Value counts for: age
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
7 9 6 16 21 51 46 61 75 71 68 85 63 67 82 70 65 55 37 43 39 33 33 21 20 28
44 45 46 47 48 49 50 51 53 54 55 56 57 58 60 61 62 65 72
11 12 12 2 6 4 6 5 1 3 3 4 3 1 2 1 1 1 1
---------------------------
Value counts for: gender
Female Male Other
247 994 10
---------------------------
Value counts for: country
Australia Austria Belgium
21 3 6
Bosnia and Herzegovina Brazil Bulgaria
1 6 4
Canada China Colombia
72 1 2
Costa Rica Croatia Czech Republic
1 2 1
Denmark Finland France
2 3 13
Georgia Germany Greece
1 45 2
Hungary India Ireland
1 10 27
Israel Italy Japan
5 7 1
Latvia Mexico Moldova
1 3 1
Netherlands New Zealand Nigeria
27 8 1
Norway Philippines Poland
1 1 7
Portugal Romania Russia
2 1 3
Singapore Slovenia South Africa
4 1 6
Spain Sweden Switzerland
1 7 7
Thailand United Kingdom United States
1 184 746
Uruguay
1
---------------------------
Value counts for: state
AL AZ CA CO CT DC FL GA IA ID IL IN KS KY LA MA
7 7 138 9 4 4 15 12 4 1 28 27 3 5 1 20
MD ME MI MN MO MS NC NE NH NJ NM NV NY OH OK OR
8 1 22 20 12 1 14 2 3 6 2 3 57 27 6 29
PA RI SC SD TN TX UT VA VT WA WI WV WY <NA>
29 1 5 3 45 44 11 14 3 70 12 1 2 513
---------------------------
Value counts for: self_employed
No Yes <NA>
1091 142 18
---------------------------
Value counts for: family_history
No Yes
762 489
---------------------------
Value counts for: treatment
Not Treated Treated
619 632
---------------------------
Value counts for: work_interfere
Never Often Rarely Sometimes <NA>
212 140 173 464 262
---------------------------
Value counts for: no_employees
1-5 100-500 26-100 500-1000 6-25
158 175 288 60 289
More than 1000
281
---------------------------
Value counts for: remote_work
No Yes
880 371
---------------------------
Value counts for: tech_company
No Yes
226 1025
---------------------------
Value counts for: benefits
Don't know No Yes
407 371 473
---------------------------
Value counts for: care_options
No Not sure Yes
499 313 439
---------------------------
Value counts for: wellness_program
Don't know No Yes
187 837 227
---------------------------
Value counts for: seek_help
Don't know No Yes
363 641 247
---------------------------
Value counts for: anonymity
Don't know No Yes
815 64 372
---------------------------
Value counts for: leave
Don't know Somewhat difficult Somewhat easy Very difficult
561 125 265 97
Very easy
203
---------------------------
Value counts for: mental_health_consequence
Maybe No Yes
476 487 288
---------------------------
Value counts for: phys_health_consequence
Maybe No Yes
273 920 58
---------------------------
Value counts for: coworkers
No Some of them Yes
258 771 222
---------------------------
Value counts for: supervisor
No Some of them Yes
390 349 512
---------------------------
Value counts for: mental_health_interview
Maybe No Yes
207 1003 41
---------------------------
Value counts for: phys_health_interview
Maybe No Yes
555 496 200
---------------------------
Value counts for: mental_vs_physical
Don't know No Yes
574 338 339
---------------------------
Value counts for: obs_consequence
No Yes
1070 181
---------------------------
Value counts for: age_group
10–15 16–20 21–25 26–30 31–35 36–40 41–45 46–50 51–55 56–60 60+
0 22 195 362 339 185 92 30 12 10 4
---------------------------
Country breakdown
Code
# Count responses by countrycountry_counts <- mental_health %>%count(country, sort =TRUE)# Calculate percentagescountry_counts <- country_counts %>%mutate(pct = n /sum(n),pct_label =paste0(round(pct *100, 1), "%") )# Generate packed circle layoutpacking <-circleProgressiveLayout(country_counts$n, sizetype ='area')# Combine layout with datacountry_counts <-bind_cols(country_counts, packing)# Generate circle vertices for plottingcircle_data <-circleLayoutVertices(packing, npoints =50)# Create a label column for text + tooltipcountry_counts <- country_counts %>%mutate(label =ifelse(n >3,paste0(country, "\n", n, " Responses\n", pct_label),"") )# Create plotp_bubble <-ggplot() +geom_polygon(data = circle_data,aes(x, y, group = id, fill =as.factor(id)),color ="white", alpha =0.7, show.legend =FALSE) +geom_text(data = country_counts,aes(x = x, y = y, label = label),size =3, color ="black", fontface ="bold", lineheight =0.9) +coord_equal() +theme_void() +labs(title ="Survey Responses by Country (Packed Bubble Chart)")# Make interactive with proper tooltipggplotly(p_bubble, tooltip ="label")
59.6% of the survey responses come from the USA with United Kingdom (14.7) and Canada (5.7%) being the next two largest. This introduces bias as how americans interact with their mental health in the workplace may be different to how other european countries work as well as eastern countries.
Age Distribution
Code
p1 <-ggplot(mental_health, aes(x = age)) +geom_histogram(binwidth =5, fill ="steelblue", color ="black") +labs(title ="Age Distribution of Survey Respondents",x ="Age",y ="Count")ggplotly(p1)
Most of the dataset is people whos age range from 20 to 40. This means the tech sector is mostly young workers in their careers.
Gender Distribution
Code
p2 <-ggplot(mental_health, aes(x = gender, fill = gender)) +geom_bar() +labs(title ="Gender Distribution of Survey Respondents",x ="Gender",y ="Count",fill ="Gender" ) +theme_minimal()ggplotly(p2)
There is significantly more men than women in the dataset which is to be expected as tech is a male dominated field.
Correlation Heatmap with treatment
Code
# Select variables and convert to numericcor_data <- mental_health %>%select( work_interfere, age, treatment, remote_work, tech_company, care_options, wellness_program, seek_help, anonymity, leave, mental_health_consequence, phys_health_consequence, coworkers, supervisor, mental_vs_physical, obs_consequence ) %>%mutate(across(everything(), ~as.numeric(as.factor(.)))) # convert all to numeric# Compute correlation matrixcor_matrix <-cor(cor_data, use ="complete.obs") %>%round(2)# Interactive heatmap with heatmaplyheatmaply( cor_matrix,main ="Interactive Correlation Heatmap of Workplace Mental Health Factors",colors =colorRampPalette(c("#E46726", "white", "#6D9EC1"))(100),xlab ="", ylab ="",margins =c(60, 60, 40, 20),grid_color ="grey80",plot_method ="plotly")
Most of the moderate correlations have to do with company culture questions for example if your anonymity is protected in regards to mental health your company tends to support mental health related leave.
1. Treatment by Age Group
Code
p3 <-ggplot(mental_health, aes(x = age_group, fill = treatment)) +geom_bar(position ="fill") +labs(title ="Proportion of Treatment by Age Group",x ="Age Group",y ="Proportion") +scale_y_continuous(labels = scales::percent_format()) +theme(axis.text.x =element_text(angle =45, hjust =1))ggplotly(p3)
This was suprising as my assumption would be the younger generations would get treated to mental health as its less stigmatized. But clearly see an increase in the proportion of those who get treated for mental health as the age group increases.
2. Treatment by Gender
Code
p4 <-ggplot(mental_health, aes(x = gender, fill = treatment)) +geom_bar(position ="fill") +labs(title ="Proportion of Treatment by Gender",x ="Gender",y ="Proportion") +scale_y_continuous(labels = scales::percent_format())ggplotly(p4)
I expected the number of treatment for mental health for the genders to be around the same with a slight edge in more females. However I did not expect that 23.76% more females get treated. The Other gender is at 90% however this includes some cis males and cis females and the sample size is 10 which is not enough to draw any real conclusions
As discussed above the majority of the dataset belongs to the USA followed by the UK and Canada. They have similar ratios with the USA being 54.69% treated and 45.31% not treated. I did not expect that the majority would be treated for mental health.
4. Treatment vs Remote Work
Code
p_remote_treatment <- mental_health %>%filter(!is.na(remote_work), !is.na(treatment)) %>%ggplot(aes(x = remote_work, fill = treatment)) +geom_bar(position ="fill") +scale_y_continuous(labels = scales::percent_format()) +labs(title ="Mental Health Treatment vs Remote Work",x ="Remote Work",y ="Proportion",fill ="Sought Treatment" )ggplotly(p_remote_treatment)
No signifigant difference between the rate of being treated and remote work.
5. Work Interference vs Mental Health Benefits
Code
p_benefits_interfere <- mental_health %>%filter(!is.na(tech_company), !is.na(work_interfere)) %>%ggplot(aes(x = tech_company, fill = work_interfere)) +geom_bar(position ="fill") +scale_y_continuous(labels = scales::percent_format()) +labs(title ="Work Interference vs Mental Health Benefits",x ="Company Provides Mental Health Benefits",y ="Proportion",fill ="Interference With Work" )ggplotly(p_benefits_interfere)
When a company has mental health benefits, employees who report Never grows by 2.3% and Often shrinks by 3.87%, Sometimes grows by 4.78% and Rarely shrinks by 3.2%. This suggests that there may be a slight improvement in work interference when a company has good mental health benefits
6. Leave Policy vs Seeking Treatment
Code
p_leave_treatment <- mental_health %>%filter(!is.na(leave), !is.na(treatment)) %>%mutate(leave =factor(leave, levels =c("Don't know","Very difficult","Somewhat difficult","Somewhat easy","Very easy" ))) %>%ggplot(aes(x = leave, fill = treatment)) +geom_bar(position ="fill") +scale_y_continuous(labels = scales::percent_format()) +labs(title ="Ease of Taking Leave vs Seeking Treatment",x ="Perceived Ease of Taking Medical Leave",y ="Proportion",fill ="Sought Treatment" ) +theme(axis.text.x =element_text(angle =45, hjust =1))ggplotly(p_leave_treatment)
It is interesting to see that more people who get treated are in companies that find it hard to get mental health realted issues.
7. Company Mental Health Resources vs Willingness to Talk to Supervisor
Code
p_resources_vs_supervisor <- mental_health %>%filter(!is.na(care_options), !is.na(supervisor)) %>%ggplot(aes(x = care_options, fill = supervisor)) +geom_bar(position ="fill") +scale_y_continuous(labels = scales::percent_format()) +labs(title ="Awareness of Mental Health Care Options vs Willingness to Talk to Supervisor",x ="Aware of Mental Health Care Options",y ="Proportion",fill ="Would Talk to Supervisor" ) +theme(axis.text.x =element_text(angle =45, hjust =1))ggplotly(p_resources_vs_supervisor)
Employees are more willing to talk to their supervisors about mental health if they are aware of company mental health resources. This could be due to the company making their mental health care options clear. It does also stand that if an employee knows the company does not have mental health resources they would not discusss their issues with their supervisor.
Further exploratory with breakdown by age group and gender
1. Treatment by Gender and Age Group
Code
p1 <-ggplot(mental_health, aes(x = age_group, fill = treatment)) +geom_bar(position ="fill") +facet_wrap(~ gender) +scale_y_continuous(labels = scales::percent_format()) +labs(title ="Mental Health Treatment by Age Group and Gender",x ="Age Group",y ="Proportion",fill ="Treatment Status" ) +theme(axis.text.x =element_text(angle =45, hjust =1))ggplotly(p1)
Above we explored mental health treatement by age group and gender individually and found it generally increased with age and females got treated more. It is expected when visualizing by age group and gender we see the exact same trend. What is interesting is the majority of females getting treated, there is little growth in treatment as age increases compared to the male counterpart.
2. Work Interference by Gender and Age Group
Code
p2 <- mental_health %>%filter(!is.na(work_interfere), !is.na(age_group), !is.na(gender)) %>%ggplot(aes(x = age_group, fill = work_interfere)) +geom_bar(position ="fill") +facet_wrap(~ gender) +scale_y_continuous(labels = scales::percent_format()) +labs(title ="Perceived Work Interference by Age Group and Gender",x ="Age Group",y ="Proportion",fill ="Work Interference" ) +theme(axis.text.x =element_text(angle =45, hjust =1))ggplotly(p2)
Besides the later age groups (which have less people seen in the age distribution) there does not seem to be a visible pattern to work interference and age group and gender
As mentioned above there was the trend with the number of people treated seemed to decrease with the increase ease of mental health leave. However when viewed through the lens of gender we see this true for males but not for females.
4. Remote Work and Treatment by Age Group
Code
p4 <-ggplot(mental_health, aes(x = remote_work, fill = treatment)) +geom_bar(position ="fill") +facet_wrap(~ age_group) +scale_y_continuous(labels = scales::percent_format()) +labs(title ="Remote Work vs Treatment by Age Group",x ="Remote Work",y ="Proportion",fill ="Treatment" )ggplotly(p4)
Again remote work does not seem to have any effect on the proportion of treatment even when broken into age groups.
Big Idea
Employers who provide strong mental health workplace benefits such as remote work, supportive leave and access to mental health resources create an environment where employees are more likely to seek mental health treatment and report lower levels of work interference. This is important as work interference effects productivity. Companies can take actionable step in the form of workplace policeis that improve both human and business outcomes.
However from the exploration of the data, it has shown that supportive mental health policies had a negligible effect on workplace interference and likelihood of employee going for treatment. The below visualizations will show how these policies effect work interference.
Explanatory Visualizations
From this analysis implemnting supportive mental health policies in a workplace has negligble effect on work interference related to mental health.
Interpretation:
Employees who are receiving mental health treatement are much more likely to have work interference due to mental health. 50.4% of not treated employees never have work interference while only 5 percent of treated employees never have work interference due to mental health. My intuition would be that people who are being treated for mental health would be less likely to have mental health work interference. However people who arent being treated for mental health may not have mental health problems and it would make sense that they would not have interference if they did not have mental health problems. Also people who are being treated for mental health do have mental health problems and it that cohourt of the sample would have more mental health related work interference. For better analysis surveying people who have mental issues
Code
p2_data <- mental_health %>%filter(!is.na(tech_company), !is.na(work_interfere), !is.na(gender)) %>%filter(gender !="Other") %>%# Exclude "Other" gender categorygroup_by(gender, tech_company, work_interfere) %>%summarise(n =n(), .groups ="drop") %>%group_by(gender, tech_company) %>%mutate(total =sum(n),prop = n / total,se =sqrt(prop * (1- prop) / total),ci_low = prop -1.96* se,ci_high = prop +1.96* se)# Plot with facetsp2 <-ggplot(p2_data, aes(x = tech_company, y = prop, fill = work_interfere)) +geom_col(position =position_dodge(width =0.9)) +geom_errorbar(aes(ymin = ci_low, ymax = ci_high),position =position_dodge(width =0.9),width =0.2) +facet_wrap(~ gender) +scale_y_continuous(labels = scales::percent_format()) +scale_fill_brewer(palette ="RdYlBu", direction =-1) +labs(title ="Impact of Company Mental Health Benefits on Work Interference",subtitle ="Includes 95% CI; grouped by gender and company benefit",x ="Company Provides Mental Health Benefits",y ="Proportion",fill ="Work Interference" ) +theme_minimal()ggplotly(p2)
Interpretation:
This graph shows how company benefits and work interference relate to each other broken down by gender. An interesting fact to note is that uncertainty for the females is much higher than it is for males. This could be due to females make up 19.74% of the survey and with a 4 times smaller sample size introducing more uncertainty. In data exploration we saw that there was a very slight improvement in work interference when there are mental health benefits. But when account for uncertainty and doing a breakdown by gender that improvement seems to be negligble. It is interesting that for females, work interference appears to be worse with those who report Never decreasing by 5.1%. Males however appear to report less work interference with the mental health benefits. Due to the very small improvements and the large amount of uncertainty when broken down by gender I cannot conclude that work place benefits effect work interference.
Code
p3 <- mental_health %>%filter(!is.na(leave), !is.na(work_interfere), !is.na(age_group)) %>%filter(!age_group %in%c("16–20", "60+")) %>%mutate(leave =factor(leave, levels =c("Don't know","Very difficult","Somewhat difficult","Somewhat easy","Very easy" ))) %>%ggplot(aes(x = leave, fill = work_interfere)) +geom_bar(position ="fill") +facet_wrap(~ age_group, ncol =3) +scale_y_continuous(labels = scales::percent_format()) +scale_fill_brewer(palette ="RdYlBu", direction =-1) +labs(title ="Work Interference Across Leave Policies by Age Group",subtitle ="Younger workers are more affected by poor leave policies",x ="Ease of Taking Leave for Mental Health",y ="Proportion",fill ="Work Interference" ) +theme_minimal() +theme(axis.text.x =element_text(angle =90, hjust =1),panel.spacing =unit(1.5, "lines") )ggplotly(p3)
Interpretation:
In this visualization some of the age groups were excluded due to the very small sample size. Across each age group there is no discernable pattern amerging. The proportion of those who report never appears to grow larger as taking leave becomes easier but in some age groups so does the proportion of Often. This could just be random noise and not an actual pattern in the dataset.
Conclusion
In conclusion, I set out to see how mental health polices reduce work intereference due to mental health however found that it did not have an effect signifigant enough to definitvely say. We did a breakdown by age group and gender which gave a deeper insight such as how benefits vs interference in a single variate analysis showed a slight increase but through the lens of gender the improvement vanished.